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Abstract 

An analysis of the average-case complexity of solving random 3-Satisfiability (SAT) 
instances with backtrack algorithms is presented. We first interpret previous rigorous 
works in a unifying framework based on the statistical physics notions of dynamical 
trajectories, phase diagram and growth process. It is argued that, under the action 
of the Davis-Putnam-Loveland-Logemann (DPLL) algorithm, 3-SAT instances are 
turned into 2 + p-S AT instances whose characteristic parameters (ratio a of clauses 
per variable, fraction p of 3-clauses) can be followed during the operation, and 
define resolution trajectories. Depending on the location of trajectories in the phase 
diagram of the 2-|-p-SAT model, easy (polynomial) or hard (exponential) resolutions 
are generated. Three regimes are identified, depending on the ratio a of the 3-SAT 
instance to be solved. Lower sat phase: for small ratios, DPLL almost surely finds 
a solution in a time growing linearly with the number N of variables. Upper sat 
phase: for intermediate ratios, instances are almost surely satisfiable but finding a 
solution requires exponential time (~ 2^ ^ with w > 0) with high probability. Unsat 
phase: for large ratios, there is almost always no solution and proofs of refutation 
are exponential. An analysis of the growth of the search tree in both upper sat and 
unsat regimes is presented, and allows us to estimate a; as a function of a. This 
analysis is based on an exact relationship between the average size of the search 
tree and the powers of the evolution operator encoding the elementary steps of the 
search heuristic. 
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1 Introduction. 



This paper focuses on tlie average complexity of solving random 3-SAT in- 
stances using backtrack algorithms. Being an NP-complete problem, 3-SAT 
is not thought to be solvable in an efficient way, i.e. in time growing at 
most polynomially with N. In practice, one therefore resorts to methods that 
need, a priori, exponentially large computational resources. One of these algo- 
rithms is t he ubiquitous Davis-Putnam-Loyeland-Logemann (DPLL) solving 
proce dure! Davis. Loeemann and Loveland . 1962t Gu. Purdom. Franco and Wah 
19971 ). DPLL is a complete search algorithm based on backtracking; its oper- 
ation is briefly recalled in Figure 1. The sequence of assignments of variables 
made by DPLL in the course of instance solving can be represented as a search 
tree, whose size Q (number of nodes) is a convenient measure of the hardness 
of resolution. Some examples of search trees are presented in Figure 2. 



In the past few years, much experimental a nd theoretical progress has been 
made on the probabilistic an alysis of 3-SAT ( Hogg. Huberman and Williamsl . 
19961: Gent, van Maaren and Walsh. .2000,) . Distributions of random instances 
controlled by few parameters are particularly useful in shedding light on 
the onset of complexity. An example that has attracted a lot of attention 
over the past years is random 3-SAT: all clauses are drawn randomly and 

each variable negated or left unchanged with equal probabilit ies. Ex periments 

(iHogg. Huberman and Williamsl. 19961 : Crawford and Auton ^ 19 96: MTtchell. Selman and Levesqu3 . 



1992 



200G; 



Selman and Kirkpatrickl . ll994 ) and theory ( Friedgut . Il999l : Dubois. Boufkhad and Mandleii . 
Dubois et al.L 12001 ) indicate that clauses can almost surely always (re- 



spectively never) be simultaneously satisfied if a is smaller (resp. larger) 
than a critical threshold ac — 4.3 as soon as the numbers M of clauses 
and N of variab l es go to infinity at a fixed ratio a. This phase transition 



flMonasson et al.1. Il999>) is accompanied by a drastic peak in hardness at 



threshold (iHogg. Huberman and William^ . 19961 : Mitchell. Selman and Levesqud . 



I992I: Crawford and Autonl 19'96t ). The~enierging pattern of complexity is as 
follows. At small ratios a < ul, where ai depends on t he heuri s tic use d by 
DPLL, ins t ances a re almost surely satisfiable (sat), see Francol ( 2001 ) and 
Achlioptai ( 2001bl ) for recent reviews. The size Q of the associated search 
tree scales, with high probability, linea rly with the number N of variables, 
and almost no backtracking is present ( Frieze and Suen . 1996| ) (Figure 2A). 
Above the critical ratio, that is when a > ac, instances are a.s. unsatis- 
fiable (unsat) and proofs of refutation are obtained through massive back- 
tracking (Figure 2B), leading to an exponential hardness: Q = 2^'^ with 
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Lu > ( Chvatal and Szmeredi , IQSSi l . In the intermediate r ange, ar. < a < an, 
finding a solution a.s. requires exponential effort (u > 0) (ICoarfa et a~ . 20001 : 
Achlioptas. Beame and MoUov . 2001c : Cocco and Monassonl . 200l[ l. 



The aim of this article is two- fold. First, we propose a simple and intuitive 
framework to unify the above findings. This framework is presented in Section 
2. It is based on the statistical physics notions of dynamical trajectories and 
phase diagram, and was, to som e extent, implicitly contained in the pioneering 
analysis of search heuristics by Chao and Francol ( 19861 1990l ). Secondly, we 
present in Section 3 a quantitative study of the growth of the search tree in 
the unsat regime. Such a study has been lacking so far due to the formidable 
difficulty in taking into account the effect of massive backtracking on the op- 
eration of DPLL. We first establish an exact relationship between the average 
size of the search tree and the powers of the evolution operator encoding the 
elementary steps of the search heuristic. This equivalence is then used (in 
a non rigorous way) to accurately estimate the logarithm u of the average 
complexity Q function of a, 



Lj{a) = lim -— log2 E, 



N 



N 



{a,N) 



[Q] 



where -E(Ar,a) denotes the expectation value for given and a. The approach 
emphasizes the relevance of partial differential equations to analyse algorithms 
in presence of massive backtracking, as opposed to ordinary differential e qua- 
tions in the absence of the latter (|WormaldL Il99,4 lAchlioptasL l2001bl) . In 
Section 4, we focus upon the upper sat regime i.e. upon ratios < a < ac- 
Combining the framework of Section 2 and the analysis of Section 3 we unveil 
the structure of the search tree (Figure 2C) and calculate u; as a function of 
the ratio a of the 3-SAT instance to be solved. 



For the sake of clarity and since the style of our approach may look unusual 
to the computer scientist reader, the status of the different calculations and 
results (experimental, exact, conjectured, approximate, ...) are made explicit 
throughout the article. 



2 Phase diagram and trajectories. 



2.1 The 2+p-SAT distribution and split heuristics 



The action of DPLL on an instance of 3-SAT causes changes to the overall 
numbers of variables and clauses, and thus of the ratio a. Furthermore, DPLL 
reduces some 3-clauses to 2-clauses. A mixed 2-(-p-SAT distribution, where p 
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(1) Choose a variable and its value (T,F) according to 
some heuristic rule (Split); 

(2) Analyze the implications of the choice on all the clauses : 

a: If all clauses are satisfied, then stop: a solution is found, 
b: If a contradiction appears, negate the last chosen 
variable and go to 2 (Backtracking), 

If all previously chosen variables have already been 
negated once, then stop: unsatisfiability is proven, 
c: if there is at least one clause with one variable, fix the 
variable to satisfy the clause and go to 2 (Unit Propagation), 
d: Else go to 1. 



Fig. 1. DPLL algorithm. When a variable has been chosen at step (1) e.g. x = T, 
at step (2) some clauses are satisfied e.g. C = {x OR y OR z) and eliminated, other 
are reduced e.g. C = (notx OR y OR z) ^ C = {y OR z). If some clauses include 
one variable only e.g. C = y, the corresponding variable is automatically fixed to 
satisfy the clause {y = T). This propagation (2c) is repeated up to the exhaustion of 
all unit clauses. Contradictions result from the presence of two opposite unit clauses 
e.g. C = {y),C' = (noty). A solution is found when no clauses are left. The search 
process of DPLL is represented by a tree (Figure 2) whose nodes correspond to (1), 
and edges to (2). Branch extremities are marked with contradictions C (2B,2C), or 
by a solution S (2A,2C). 

is the fraction of 3-clauses, can be used to model what remains of the input 
instance at a node of the search tree. Using experiments and methods from 



statistical mechanics ()Monasson et al.L 119991 1 . the threshold line adp), sep 



arating sat from unsat phases, may be estimated with the results shown in 
Figure 3. For p < Po = 2/5, i. e. to the left of point T , the threshold lin e is g iven 
by ac{p) = 1/(1 — p), as rigorously confirmed bv lAchlioptas et al.l ()2n01a) . 



and saturates the upper bound for the satisfaction of 2-clauses. Above po, no 
exact value for ac{p) is known. Note that ac — 4.3 corresponds to p = 1. 

The phase diagram of 2+p-SAT is the natural space in which DPLL dynamic 
takes place. An input 3-SAT instance with ratio a shows up on the right 
vertical boundary of Figure 3 as a point of coordinates {p = 1, a). Under the 
action of DPLL, the representative point moves aside from the 3-SAT axis and 
follows a trajectory. This trajectory obviously depends on t he heuristic of split 
followed by DPLL (Figure 1). Possible simple heuristics are f Chao and Francol 
E98a..l990) . 

• Unit- Clause (UC): randomly pick up a literal among a unit clause if any, or 
any unset variable otherwise. 

• Generalized Unit-Clause (CUC): randomly pick up a literal among the 
shortest avalaible clauses. 
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Fig. 2. Types of search trees generated by the DPLL solving procedure on random 
3-SAT. A. simple branch: the algorithm finds easily a solution without ever back- 
tracking. B. dense tree: in the absence of solution, DPLL builds a tree, including 
many branches ending with contradictory leaves, before stopping. C. mixed case, 
branch + tree: if many contradictions arise before reaching a solution, the resulting 
search tree can be decomposed into a single branch followed by a dense tree. G is 
the highest node in the tree reached by DPLL through backtracking. 

• Short Clause With Majority (SCi): randomly pick up a literal among unit 
clauses if any; otherwise randomly pick up an unset variable v, count the 
numbers of occurences i,I of v, v in 3-clauses, and choose v (respectively v) 
if i > i (resp. i < i). When i = 1, v and v are equally likely to be chosen. 

Rigorous mathematical analysis, undertaken to provide rigorous bounds to the 
critical threshold ac, have so far been restricted to the action of DPLL prior 
to any backtracking, that is, to the first descent of the algorithm in the search 
tree ^ . The corresponding search branch is drawn on Figure 2A. These studies 
rely on the two following facts: 

First, the representative point of the instance treated by DPLL does not 
"leave" the 2+p-SAT phase diagram. In other words, the instance is, at any 
stage of the search process, uniformly distributed from the 2+p-SAT distri- 
bution conditioned to its clause-per-variable ratio a and fraction of 3-clauses 
p. This assumption is not true fo r all heuristics of split, but holds for the 
above examples {UC, GUC, SCi) (jChao and Francol 1986[ ). Analysis of more 
sophisticated heuristics require to h andle more complex instance distributions 
f Kaporis. Kirousis and Lalasl . 1200 J ). 



Secondly, the trajectory followed by an instance in the course of resolution is 



1 



The analysis of Frieze and SuenI ( 1996h however includes a very limited version 



of backtracking, see Section 2.2 
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a stochastic object, due to the randomness of the instance and of the assign- 
ments done by DPLL. In the large size limit (A^ — > oo), this trajectory gets 
concentrated around its average locus in the 2+p-SAT phase diagram. This 
concentrat i on phenomenon results from general properties of Markov chains 
( Wormaldl . ITooi lAchlioptasl . bOOlbf ). 



2.2 Trajectories associated to search branches 



Let us briefly recall Chao and Franca ( 1986t ) analysis of the average trajec- 
tory corresponding to the action of DPLL prior to backtracking. The ratio of 
clauses per variable of the 3-SAT instance to be solved will be denoted by oq. 
The numbers of 2 and 3-clauses are initially equal to C2 = 0, C3 = ao ^ re- 
spectively. Under the action of DPLL, C2 and C3 follow a Markovian stochastic 
evolution process, as the depth T along the branch (number of assigned vari- 
ables) increases. Both C2 and C3 are concentrated around their expectation 
values, the densities Cj(t) = E[Cj(T = t N)] (j = 2, 3) of which obey a set of 
coupled ordinary d ifferential equations (ODE) ()Chao and Francoi 19861 199d 
AchlioDtasl . f2nnibl ). 



dc3 
dt 



3C3 

1 -t 



dC2 

It 



3C3 



2 02 



2(1 -t) 1-t 



-Piit) h{t) 



where pi{t) = 1 — C2(t)/(1 — t) is the probability that DPLL fixes a variable 
at depth t (fraction of assigned variables) through unit-propagation. Function 
h depends upo n the heuristic: hj r n(t) = 0, hcucif) = 1 (if cto > 2/3; for 
ao < 2/3, see IChao and Francoi \l99(t ). hscAt) = ae-^{Io{a) + h{a))/2 
where a = 3c3(t)/(l — t) and is the i^'^ modified Bessel function. To obtain 
the single branch trajectory in the phase diagram of Figure 3, we solve ODEs 
(2) with initial conditions 02(0) = 0,03(0) = ao, and perform the change of 
variables 



Results are shown for the GUC heuristics and starting ratios = 2 and 2.8 
in Figure 3. The trajectory, indicated by a light dashed line, first heads to 
the left and then reverses to the right until reaching a point on the 3-SAT 
axis at a small ratio. Further action of DPLL leads to a rapid elimination of 
the remaining clauses and the trajectory ends up at the right lower corner S, 
where a solution is found. 
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Fig. 3. Phase diagram of 2+p-SAT and dynamical trajectories of DPLL. The thresh- 
old line acip) (bold full line) separates sat (lower part of the plane) from unsat (up- 
per part) phases. Extremities lie on the vertical 2-SAT (left) and 3-SAT (right) axis 
at coordinates {p = 0,ac = 1) and {p = l,ac — 4.3) respectively. Departure points 
for DPLL trajectories are located on the 3-SAT vertical axis and the corresponding 
values of a are explicitely given. Dashed curves represent tree trajectories in the un- 
sat region (thick lines, black arrows) and branch trajectories in the sat phase (thin 
lines, empty arrows). Arrows indicate the direction of "motion" along trajectories 
parametrized by the fraction t of variables set by DPLL. For small ratios a < ol, 
branch trajectories remain confined in the sat phase, end in S of coordinates (1, 0), 
where a solution is found. At (~ 3.003 for the GUC heuristic), the single branch 
trajectory hits tangentially the threshold line in T of coordinates (2/5, 5/3). In the 
intermediate range ai < a < ac, the branch trajectory intersects the threshold 
line at some point G (which depends on a). A dense tree then grows in the unsat 
phase, as happens when 3-SAT departure ratios are above threshold a > ac — 4.3. 
The tree trajectory halts on the dot-dashed curve a ~ 1.259/(1 — p) where the tree 
growth process stops. At this point, DPLL has reached back the highest backtrack- 
ing node in the search tree, that is, the first node when a > ac, or node G for 
aL < a < ac- In the latter solution can be reached from a new descending 

branch while, in the former case, unsatisfiability is proven, see Figure 2. 



7 





A Q 

4.0 


i 


1 n 
10 


1 

15 


on 
20 


3.5 


^EXP 


0.089 


0.0477 


0.0320 


0.0207 


0.0153 


0.034 




±0.001 


±0.0005 


±0.0005 


±0.0002 


±0.0002 


±0.003 


iOTHE 


0.0916 
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0.0323 


0.0207 


0.0153 
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Table 1 

Logarithm of the complexity to from experiments (EXP) and theory (THE) as a 
function of the ratio qq of clauses per variable of the 3-SAT instance. Ratios above 
4.3 correspond to unsat instances; the rightmost ratio lies in the upper sat phase. 



Frieze and SuenI ()l996| ) have shown that, for ratios < — 3.003 (for the 



GUC heuristics), the full search tree essentially reduces to a single branch, 
and is thus entirely described by the ODEs (2). The amount of backtracking 
necessary to reach a solution is bounded from above by a power of log A^. The 
average size of the branch, Q, scales linea rly with N with a naultipli cative 
factor 7(ao) = Q/N that can be calculated (|Cocco and Monassonl . l2001[ ). The 
boundary a^, of this easy sat region can be defined as the largest initial ratio 
ao such that the branch trajectory (pit), ait)) issued from (1, ao) never leaves 
the sat phase during DPLL action. In other words, the instance essentially 
keeps being sat throughout the resolution process. We shall see in Section 4 
this does not hold for sat instances with ratios a^, < ao < ac- 



3 Analysis of the search tree growth in the unsat phase. 



In this Section, we present an analysis of search trees corresponding to unsat 
instances, that is, in presence of massive backtracking. We first report results 
from numerical experiments, then expose our analytical approach to compute 
the complexity of resolution (size of search tree) . 



3.1 Numerical experiments 



For ratios above threshold (ao > ac — 4.3), instances almost never have a 
solution but a considerable amount of backtracking is necessary before proving 
that clauses are incompatible. Figure 2B shows a generic unsat, or refutation, 
tree. In contrast to the previous section, the sequence of points (p, a) attached 
to the nodes of the search tree do not arrange along a line any longer, but 
rather form a cloud with a finite extension in the phase diagram of Figure 3. 
Examples of clouds are provided on Figure 4. 
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Fig. 4. Clouds associated to search trees obtained from the resolution of three unsat 
instances with initial ratios ao = 4.3, 7 and 10 respectively. Each point in the cloud 
corresponds to a splitting node in the search tree. Sizes of instances and search 
trees are N = 120, Q = 7597 for ao = 4.3, N = 200, Q = 6335 for ao = 7, and 
N = 300, Q = 6610 for ao = 10. 

The number of points in a cl oud i. e. the size Q of i t s asso ciated search tree 
grows exponentially with N ( Chvatal and Szmeredi . 1988| ). It is thus conve- 
nient to define its logarithm lu through Q = 2^". We experimentally measured 
Q, and averaged its logarithm uj over a large number of instances. Results have 
then be extrapolated to the N ^ oo limit (|Cocco and Monassonl. l200lh an d 



are reported in Table 1. u is a decreasing function of an (jBeame et alT 
the larger ao, the larger the number of clauses affected by a split, and the 
earlier a contradiction is detected. We will use the vocable "branch" to denote 
a path in the refutation tree which joins the top node (root) to a contradic- 
tion (leaf). The number of branches, B, is related to the number of nodes, Q, 
through the relation Q = B — 1 valid for any complete binary tree. As far 
as exponential (in N) scalings are concerned, the logarithm of B (divided by 
A^) equals u. In the following paragraph, we show how B can be estimated 
through the use of a matrix formalism. 



3.2 Parallel growth process and Markovian evolution matrix 



The probabilistic analysis of DPLL in the unsat regime appears to be a 
formidable task since the search tree of Figure 2B is the output of a complex, 
sequential process: nodes and edges are added by DPLL through successive 
descents and backtrackings (depth- first search). We have imagined a different 
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Fig. 5. Imaginary, parallel growth process of an unsat search tree used in the theoret- 
ical analysis. Variables are fixed through unit-propagation, or by the splitting heuris- 
tic as in the DPLL procedure, but branches evolve in parallel. T denotes the depth 
in the tree, that is the number of variables assigned by DPLL along each branch. At 
depth T, one literal is chosen on each branch among 1-clauses (unit-propagation, 
grey circles not represented on Figure 2), or 2,3-clauses (splitting, black circles as 
in Figure 2). If a contradiction occurs as a result of unit-propagation, the branch 
gets marked with C and dies out. The growth of the tree proceeds until all branches 
carry C leaves. The resulting tree is identical to the one built through the usual, 
sequential operation of DPLL. 

building up of the refutation tree, which results in the same complete tree 
but can be mathematically analyzed. In our imaginary process (Figure 5), the 
tree grows in parallel, layer after layer (breadth- first search). At time T = 0, 
the tree reduces to a root node, to which is attached the 3-SAT instance to 
be solved, and an attached outgoing edge. At time T. that is, after having 
assigned T variables in the instance attached to each branch, the tree is made 
of B{T) (< 2'^) branches, each one carrying a partial assignment of variables. 
At next time step T — > T + 1, a new layer is added by assigning, accord- 
ing to DPLL heuristic, one more variable along every branch. As a result, a 
branch may keep growing through unitary propagation, get hit by a contra- 
diction and die out, or split if the partial assignment does not induce unit 
clauses. This parallel growth process is Markovian, and can be encoded in an 
instance-dependent matrix we now construct. 

To do so, we need some preliminary definitions: 

Definition 1 Partial state of variables. 

The partial state s of a Boolean variable x is one of the three following pos- 
sibilities: undetermined (u) if the variable has not been assigned by the search 
heuristic yet, true (t) if the variable is partially assigned to true, false (f) 
if the variable is partially assigned to false. The partial state S of a set of 
Boolean variables X — {a;i,a;2, . . . ^xn} is the collection of the states of its 
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elements, S — {si, S2, ■ ■ ■ , sn}- 

Let I he an instance of the SAT problem, defined over a set of Boolean variables 
X with partial state S. A clause of I is said to be 

• satisfied if at least one of its literals is true according to S; 

• unsatisfied, or violated if all its literals are false according to S; 

• undetermined otherwise; then its 'type' is the number (= 1, 2, 3) of undeter- 
mined variables it includes. 

The instance I is said to be satisfied if all its clauses are satisfied, unsatisfied 
if one (at least) of its clauses is violated, undetermined otherwise. The set of 
partial states that violate I is denoted by W. 

Definition 2 Vector space attached to a variable. 

To each Boolean variable x is associated a three dimensional vector space v 
with spanning basis \u), \t), \f), orthonormal with respect to the dot (inner) 
product denoted by {.\.), 

{u\u) = m = {f\f) = l, {u\t) = {u\f) = {t\f)^0. (4) 

The partial state attached to a basis vector \s) is s (=u,t,f). 

Letters u, t, f stand for the different partial states the variable may acquire in 
the course of the search process. Note that the coefficients of the decomposition 
of any vector \x) & v over the spanning basis, 

\x) ^ x^""^ \u) + x^'^ \t) + x^f^ \f) , (5) 

can be obtained through use of the dot product: x^^^ = {s\x) with s — u,t, f. 
By extension, {S\ denotes the transposed of vector l^"). 

Definition 3 Vector space attached to a set of variables. 

We associate to the set X — {xi, X2, ■ ■ ■ ,xn} of N Boolean variables the 3^- 
dimensional vector space V = Vi (8) V2 (8) . . . Vjv- The spanning basis ofV is 
the tensor product of the spanning basis of the Vj 's. To lighten notations, we 
shall write \si, S2, sjv) for \si) ® |<S2) ® . . . ® \sn)- The partial state attached 
to a basis vector \S) = \si, S2, sn) is S — {si, S2, sn)- The dot product 
naturally eoctends over \ : (s^, S2, . . . , S2, . . . , Siv) — ^ if si — s[ 

otherwise. 

Any element \X) e V can be uniquely decomposed as a linear combination 
of vectors from the spanning basis. Two examples of vectors are |E) and \U), 



11 



respectively the sum of all vectors in the spanning basis and the fully unde- 
termined vector, 



|S) = {\u) + \t) + I/)) 0{\u) + \t) + \f))0...^{\u) + \t) + I/)) , (6) 
\U)^\u,u,...,u) . (7) 

Basis vectors fulfill the closure identity 

T.\s){s\-i , (8) 

where 1 is the identity operator on V. To estabhsh identity (8), apply the left 
hand side operator to any vector \S') and take advantage of the orthonormality 
of the spanning basis . 

Definition 4 (Heuristic-induced) Transition probabilities 

Let S = (si, S2, sjv) be a partial state which does not violate instance I. 
Call ^(^'^^ with J = 1,...,N and X = t, f, the partial state obtained from S 
by replacing Sj with x. The probability that the heuristic under consideration 
(UC, GUC, ...) chooses to assign variable xj when presented partial state S is 
denoted by h{j\S). The probability that the heuristic under consideration then 
fixes variable xj to x {— t, f) is denoted by g{x\S,j). 

A few elementary facts about transition probabihties are: 

(1) hij\S) = if s,^u. 

(2) g{x\S,j) + g{x\S,j) = l. 

(3) Assume that the number Ci{S) of undetermined clauses of type 1 (unit 
clauses) is larger or equal to unity. Call Ci{j\S) the number of unit clauses 
containing variable Xj, and Ci{x\S,j) the number of unit clauses satisfied 
if Xj equals x {= tj). Clearly Ci{j\S) = Ci{t\SJ) + Ci{f\SJ). Then, 
as a result of unit-propagation, 

i7(^|5,j) = ^^|^ forx = t,/andCi(j|5)>l. (9) 

(4) In the absence of unitary clause {Ci{S) — 0), transition probabihties 
depend on the details of the heuristic. For instance, in the case of the UC 
heuristic, 

(a) if Sj = u, h{j\S) = ^ and g{x\S,j) = \, 

(b) \is,+u, MjI'5) = o, 

where u{S) is the number of undetermined variables in partial state S. 
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(5) The sum of transition probabilities from a partial state S is equal to 
unity, 



N 

i=i 



9it\S,j) + gif\S,j) 



(10) 



It is important to stress that the definition of the transition probabilities does 
not make any reference to any type of backtracking. It relies on the notion of 
variable assignement through the heuristic of search only. 

Let us now introduce the 

Definition 5 (Heuristic-induced) Evolution operator. 

The evolution operator is a linear operator H acting on V encoding the ac- 
tion of DPLL for a given unsatisfiable instance I. Its matrix elements in the 
spanning basis are 



(1) if S violates 1, 

{liiS' = S 
{S'\R\S) = { 

(2) if S does not violate 1, 



(5'|H|5) 



h{j\S) X g{x\S,j) if Ci{S) > 1 and S' = S^^'^^ 
h{j\S) if Ci(,5) = and 

(S' = or S' = 
otherwise 



:i2) 



where S, S' are the attached partial states to \S), \ S') , and Ci{S) is the number 
of undetermined clauses of type 1 (unitary clauses) for partial state S. 

Notice that we use the same notation, H, for the operator and its matrix in 
the spanning basis. The different cases encountered in the above definition of 
H are symbolized in Figure 6. We may now conclude: 

Theorem 6 Branch function and average size of refutation tree 

Call branch function the function B with integer-valued argument T , 

= (EIH^C/) , (13) 



where H is the evolution operator associated to the unsatisfiable instance 1, 
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H"^ denotes the T*^ (matricial) power ofH, and vectors |E), \U) are defined 
in (6,7). Then, there exist two instance-dependent integers T* (< A^) and 
B* (< 2^) such that, 

B{T) = , \/T>T* . (14) 



Furthermore, B* is the expectation value over the random assignments of vari- 
ables of the size (number of leaves) of the search tree produced by DPLL to 
refute I. The smallest non zero T* for which (14) holds is the largest number 
of variables that the heuristic needs to assign to reach a contradiction. 



Proof of Theorem 6 

Let iS be a partial state. We call refutation tree built from S a complete search 
tree that proves the unsatisfiability of I conditioned to the fact that DPLL is 
allowed to assign only variables which are undetermined in S. The height of 
the search tree is the maximal number of assignments leading from the root 
node (attached to partial state S) to a contradictory leaf. 

Let T be a positive integer. We call briS) the average size (number of leaves) 
of refutation trees of height < T that can be built from partial state S. Clearly, 
briS) = 1 for all S e W, and briS) >2ifS^W. Recall W is the set of 
violating partial states from Definition 1. 

Assume now T is an integer larger or equal to 1, 5" a partial state with Ci{S) 
unitary clauses. Our parallel representation of DPLL allows us to write simple 
recursion relations: 

(1) as eW, briS) = 1 = bT-i{S). 

(2) ifS and Ci{S) > 1, 

N 

MS) = E E h{j\S) 9{x\S,j) bT-,{S^''^^) . (15) 

j = l X=tJ 



(3) ifS and Ci{S) = 0, 

N r 

briS) = E h{j\S) bT-i{S^'''^) + bT-i{S^'^f^) 



(16) 



These three different cases are symbolized on Figure 6A, B and C respectively. 
Prom definitions (11,12), these recursion relations are equivalent to 

bT{S)^Y.{S'\il\S)bT-i{S') , (17) 

5' 
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for any partial state S. Let be the vector of V whose coefficients on the 
spanning basis {\S)} are the bxiSys. In particular, 

\bo) = E l^o) . (18) 
Soew 

Then identity (17) can be written as = H''' |6t-i) where is the trans- 
posed of the evolution operator. Note that the branch function (13) is simply 
B{T) = {U\bT). We deduce 

|6T) = (Ht)^|6o)- E Ep(^t;5o)|^o) , (19) 
Soew o-T 

where the second sum runs over all 3^^^ sequences = ("S"!, 5*2, ... , St-i, St) 
of T partial states with associated weight 

p{aT]So) = {ST\li^\ST-i) X ... X (52|Ht|5i) X (5i|Ht|5o) 

= (S'olHl^i) X (^ilHl^a) • • • X (5t-i|H|5t) , (20) 

The length of a sequence is the number of partial states it includes. We call 
S'o-genuine a sequence of partial states (Tt with non zero weight (20). The 
second sum on the right hand side of equation (19) may be rewritten as a sum 
over all S'o-genuine sequences ar of length T only. 

Lemma 7 Take Sq G W . Any So-genuine sequence crjv+i of length A'" + 1 
includes at least one partial state belonging to W. 

Suppose this is not true. There exists a genuine sequence (Tn+i with St ^ W, 
V 1 < T < A^+1. Call Ut the number of undetermined variables in partial state 
St- Since the sequence is genuine, (5'r-i|H|5'T) ^ for every T comprised 
between 1 and A^+l. Prom the evolution operator definition (12), St contains 
exactly one more undetermined variable than St-i, and ut — ut-i + 1 for all 
1 < T < A^-|- 1. Hence mat+i — uq = N + 1. But uq and u^+i are, by definition, 
integer numbers comprised between and A^. □ 

Prom Lemma 7, the index i/ of a (So-genuine sequence a^+i of length + 1, 
= sup {T : 1 < T < AT + 1 and e ajv+i and St e W} , (21) 

exists and is larger, or equal, to 1. Let us define 

o'n+1 — {Su+i, S,^+2, ■ ■ ■ , Sn, Sn+i) . (22) 

Prom definition (11), cr^+i is simply So repeated u times followed by &n+i, and 
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p((TAr+i) = p((TAr+i). Call {Sq) the Smallest index of all S'o-genuine sequences 
of length + 1, and u* the minimum of i'*{So) over 5*0 G W. Then, from 
equation (19), \bN+i) = |feiv) = ... = |6r*) where T* = N + l-iy*<N. 
Thus \bT*) is a right eigenvector of H''' with eigenvalue unity, and Ibx) = |&r*) 
for all T > T*. T* , which depends upon instance I, is the length of the 
longest genuine sequence without repetition. It is the maximal number of 
(undetermined) variables to be fixed before a contradiction is found. 

Lemma 8 Take S ^ W . Then there is no S -genuine sequence of length T*. 

Suppose this is not true. There exist S ^ W and a S'-genuine sequence ay* 
of length T*. As S does not violate I, and I is not satisfiable, there are still 
some undetermined variables in partial state S. A certain number of them, 
say T" > 1, must be assigned to some t, f values to reach a contradiction, that 
is, a partial state Sq G W. Therefore there exists a 5'o-genuine sequence, a, of 
length T' > 1 ending with S and with no repeated partial state. Concatenating 
5" and a^*, we obtain a S'o-gcnuinc sequence of length T*+T' > T* and without 
repetition, in contradiction with the above result. □ 

Using Lemma 8, we may replace |6o) in equation (19) with |E), and find 

B{T) ^ (S|H^|t/) = (t/|(Ht)^|S) = (JJ\hT*) = hT*{U) , (23) 



for all T >T*. Hence, B* = bT*{U) is the average size (over the random as- 
signments made by the heuristic) of the refutation tree to instance I generated 
from the fully undetermined partial state. □ 



3.3 Some examples of short instances and associated matrices 

We illustrate the above definitions and results with three explicit examples of 
instances involving few variables: 

Example 9 Instance over N — 1 variable 

Consider the following unsat instance built from a single variable, 

Ii = Xi A Xi . (24) 

The 3-dimensional vector space Vi is spanned by vectors |/). The evo- 
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s 

A 



S 

V 



s 

B 



A 



Fig. 6. Transitions allowed by the heuristie-indueed evolution operator. Grey and 
black nodes correspond to variables assigned through unit-propagation and splitting 
respectively, as in Figure 5. A. If the partial state S already violates the instance 
I, it is left unchanged. B. If the partial state docs not violate I and there is at least 
one unitary clause, a variable is fixed through unit propagation (grey node) e.g. 
Xj = X. The output partial state is S^'^. C. If the partial state does not violate I 
and there is no unitary clause, a variable Xj is fixed through splitting (black node). 
Two partial states are generated, S^'^ and S^'^. 



lution matrix reads 



H 



'^O 0^ 

1 10 



with \u) 



, i«> 



, I/) 



^0^ 



. (25) 



Entries can he interpreted as follows. Starting from the u state, variable Xi will 
he set through unit-propagation to t or f with equal probabilities: {t\ll\u) — 
(/|H|m) = 1/2. Once the variable has reached this state, the instance is vio- 
lated: (t|H|t) = (/|H|/) = 1. All other entries are null. In particular, state u 
can never be reached from any state, so the first line of the matrix is filled in 
with zeroes: (m|H|s) = 0, Vs. Function (13) is easily calculated 







t 


(A 






1 







= 1 , V T > 













Therefore, T* = B* = 1. Indeed, refutation is obtained without any split, and 
the search tree involves a unique branch of length 1 (Figure 7A). 
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Our next example is a 2-SAT instance whose refutation requires to split one 
variable. 



Example 10 Instance over N — 2 variables, with a unique refutation tree. 
I2 = {xi V X2) A {xi V X2) A {xi V X2) A {xi V X2) (27) 

The evolution matrix H is a 9 x 9 matrix with 16 non zero entries, 



(s, M|H|ti, ti) = (ti, s|H|ti, ti) = - , W s = t, f (28) 

(s,s'|H|s,ii) = (s,s'|H|w,s') = ^ , ys,s'^t,f (29) 
(s',s|H|s',s) = l , ys,s' = tj . (30) 

H^e now explain how these matrix elements were obtained. From the unde- 
termined state \u,u), any of the four clause can be chosen by the heuris- 
tic. Thus, any of the two literals Xi, X2 has a probability 1/2 to be chosen: 
h{l\u,u) = h{2\u,u) = \. Next, unit-propagation will set the unassigned vari- 
able to true, or false with equal probabilities 1/2 (29). Finally, entries corre- 
sponding to violating states in eqn (30) are calculated according to rule (11). 

The branch function B{T) equals 1 for T = 0, 2 for any T > 1; thus, T* ~ 
1 and B* — 2, in agreement with the associated search tree symbolized in 
Figure 7B. 

We now introduce an instance with a non unique refutation tree. 
Example 11 Instance with N — 3 variables, and two refutation trees. 

I3 = {xi V X2) A {xi V X2) A {xi V X2) A {xi V X2) A {X3 V X3) (31) 



Notice the presence of a (trivial) clause containing opposite literals, which 
allows us to obtain a variety in the search trees without considering more than 
three variables. The evolution matrix H is a 27 x 27 matrix with 56 non zero 
entries (for the GUC heuristic). 



2 

{s,u,u\'il\u,u,u) — {u,s,u\'ii.\u,u,u) — -, \/s — t,f (32) 

5 

{u,u, s\ii\u,u,u) = - , \/s = t,f (33) 
5 

{s, s', s"\H\s, u, s") = {s, s', s"\H\u, s', s") = ^ , V s, s' = /; s" ^u,t,f 
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empty assignment empty assignment empty assignment 

, I 



o ^ o o o o 



c c c c 



B 



Fig. 7. Refutation search trees associated to instances Ii, I2 and I3. Grey and 
black nodes correspond to variables assigned through unit-propagation and split 
respectively, as in Figure 5. A. Example 9: refutation of instance Ii is obtained as a 
result of unit-propagation. The size (number of leaves) of the search tree is i? = 1. 
B. Example 10: search tree generated by DPLL on instance I2. The black and grey 
node correspond to the split of x\ and unit-propagation over X2, or vice- versa. The 
size of the tree is S = 2. C. Example 11: search tree corresponding to the instance 
I3 when DPLL first splits variable X3. The size of the tree is -B = 4. If the first split 
variable is x\ or the refutation search tree of instance I3 corresponds to case B. 

(s', s|H|w, s) — {u, s', s|H|w, u,s) — - , y s, s' — t, f 
(s, s', s"|H|s, s', s") = 1 , V s,s' = t,f; s" = u,tj 

The first split variable is if the last clause is chosen (probability 1/5), or 
xi or X2 otherwise (with probability 2 / 5 each), leading to expressions (32) and 
(33). The remaining entries o/H are obtained in the same way as explained 
in Example 10. 

We obtain B{Q) = 1, B{1) = 2 and B{T > 2) = 12/5. Therefore, T* = 2 and 
12 4 1 

S* = - = -x2 + -x4 , (34) 
00 o 



where the different contributions to B* and their probabilities are explicitely 
written down, see Figures 7B and 7C. 



3.4 Dynamical annealing approximation 



Let us denotes by q the expectation value of a function q of the instance I over 
the random 3-SAT distribution, at given numbers of variable, N, and clauses. 
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a N. Prom Theorem 6, the expectation value of the size of the refutation tree 
is 



B*{a, N) = B*^ (E|H^|C/) . (35) 

Calculation of the expectation value of the A'"*'* power of H is a hard task that 
we were unable to perform for large sizes N. We therefore turned to a simplify- 
ing approximation, hereafter called dynamical annealing. This approximation 
is not thought to be justified in general, but may be asymptotically exact in 
some limiting cases we will expose later on. 

A first temptation is to approximate the expectation of the N*'^ power of H 
with the iV*^ power of the expectation of H. This is however too a brutal 
approximation to be meaningful, and a more refined scheme is needed. 

Definition 12 Clause projection operator 

Consider an instance I of the 3-SAT problem. The clause vector C{S) of a 
partial state S is a three dimensional vector C = (Ci, C2, C3) where Cj is the 
number of undetermined clauses ofl of type j. The clause projection operator, 
P((7); is the operator acting on V and projecting onto the subspace of partial 

— * 

state vectors with clause vectors C, 



P((7) \S) = 



Cj-Cj{S) 



\S) , (36) 



where 5 is the Kronecker function. The sum of all state vectors in the spanning 
basis with clause vector C is denoted by |S(C)) = P(C) |E). The sum of all 

— * 

state vectors in the spanning basis with clause vector C and U undetermined 
variables is denoted by |E;7((7)). 

It is an easy check that P is indeed a projection operator: P^(C) = P(C). As 
the set of partial states can be partitioned according to their clause vectors, 

^P((7) = ^P^((7) = 1 . (37) 

c c 



We now introduce the clause vector-dependent branch function 

S((7,r) = (E((7)|H^|C/) . (38) 

Summation of the B^s over all C gives back function (13) from identity (37). 
The evolution equation for B{C,T) is. 
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S((7,T + 1) = (E((7)|H xH^|C/) 

= (E((7)|Hx |X:P^(C')1 xH^t/) 



(S((?)|H X P((7') X (y.\S){S\] X P((7') X H^|t/) 
C' \ s / 

= EE(S((?)|H X P((7')|^) {S\P{C') X H^|f/) (39) 
C' s 

where we have made use of identities (8) and (37). We are now ready to do 
the two following approximation steps: 

Approximation 13 Dynamical annealing (step A) 

Substitute in equation (39) the partial state vector 

nC')\S) with ^ \T.m-t{C')) , (40) 



that is, with its average over the set of basis vectors with clause vector C and 
N — T undetermined variables. 

Following step A, equation (39) becomes an approximated evolution equation 
for B, 

B{C,T + 1)^YU[C,C';T]B{C,T) , (41) 

C' 



where the new evolution matrix H, not to be confused with H, is 

H[C;C-;T] = H»^^ . (42) 
(E|E^_^(C")) 

Then, 

Approximation 14 Dynamical annealing (step B) 
Substitute in equation (41) the evolution matrix tl with 



n[C,C';T]^mS^M^ (43) 
(E|E^^-T(C')) 



that is, consider the instance I is redrawn at each time step T — > T + 1, keeping 
information about clause vectors at time T only. 
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Let us interpret what we have done so far. The quantity we focus on is 
B{C\T + 1), the expectation number of branches at depth T in the search 
tree (Figure 5) carrying partial states with clause vector C = (Ci,C2,C3). 
Within the dynamical annealing approximation, the evolution of the 5's is 
Markovian, 



B{C;T + 1) = J2 ii[C,C';T] B{C';T) . (44) 

C' 



The entries of the evolution matrix H[C, C; T] can be interpreted as the 
average number of branches with clause vector C that DPLL will generate 
through the assignment of one variable from a partial assignment (partial 
state) of variables with clause vector C. 



For the GUC heuristic, we find (|Cocco and Monassonl . 120011 ). 



W2=0 



1 ^ (C'\ ( 2 



^2 



X 



2 

X 



[6ci-wi + 5ci-i-u>i] \ , (45) 



uii=0 



2/ 



where (5x denotes the Kronecker delta function over integers X: Sx = ^ H X = 
0, 6x = otherwise. Expression (45) is easy to obtain from the interpretation 
following equation (44). 
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3.5 Generating functions and asymptotic scalings at large N 



Let us introduce the generating function G{y\T) of the average number of 
branches B{C ]T) where y = {yi,y2, y-s), through 



Giy;T) = ^ ey-^ B{C,T[ 
6 



(46) 



Evolution equation (41) for the B's can be rewritten in term of the generating 
function G, 



G(f;T+l) = e-^i(^^)G(7(y);T) + 

(e-^^^^i^ev- + 1) _ e-^^(^^) G( - oo, 72(f), lM\T) (47) 

where 7 is a vectorial function of argument y whose components read 



i\{y) = y\ + In 

72 (y) =1/2 + In 
73(y) =Z/3 + In 



1 



1 + 



1 + 



2{N -T) 
2 (e 



-y2 



N-T 



N-T 



1 + e^M - 1 



-y-i 



:i + e 



(4J 



To solve equation (47), we infer the large behaviour of G from the following 
remarks: 

(1) Each time DPLL assigns variables through splitting or unit-propagation, 
the numbers Gj of clauses of length j undergo 0(1) changes. It is thus 
sensible to assume that, when the number of assigned variables increases 
from Ti = t N to T2 = tN + AT with AT very large but o{N) e.g. 
AT = \/iV, the densities C2 = G2/N and = G3/N oi 2- and 3-clauses 
have been modified by o(l). 

(2) On the same time interval Ti < T < T2, we expect the number of unit- 
clauses Gi to vary at each time step. But its distribution p(Ci|c2, C3; t), 
conditioned to the densities C2, C3 and the reduced time t, should reach 
some well defined l imit distribution. This c laim is a generalization of the 
result obtained by Frieze and SuenI ( 1996t ) for the analysis of the GUC 
heuristic in the absence of backtracking. 

(3) As long as a partial state does not violate the instance, very few unit- 
clauses are generated, and splitting frequently occurs. In other words, the 
probability that Ci = is strictly positive as N gets large. 
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The above arguments entice us to make the following 



Claim 15 Asymptotic expression for the generating function G 

For large N,T at fixed ratio t = T/N , the generating function (46) of the 
average numbers B of branches is expected to behave as 



G{yi,y2,y3;tN) = exp 



N ^{y2, l/s; t) + ip{yi, y2, ys] t) + o(l) 



(49) 



Hypothesis (49) expresses in a concise way some important information on 
the distribution of clause populations during the search process that we now 
extract. Call cu the Legendre transform of (p, 



uj(c2,cs;t) = min 



J/2,2/3 



v{ y2, ys ; - y2 C2 - cs 



(50) 



Then, combining equations (46), (49) and (50), we obtain 

J2 PiCi\c2,C3;t) B{Ci,C2N,csN;tN)>:exp[Nuj{c2,C3;t)] , (51) 

Ci>0 



up to non exponential in N corrections. In other words, the expectation value 
of the number of branches carrying partial states with [l — t)N undetermined 
variables and Cj N j-clauses {j = 2, 3) scales exponentially with N, with a 
growth function uj{c2,c^;t) related to ^p{y2,y3',t) through identity (50). More- 
over, 9?(0, 0; t) is the logarithm of the number of branches (divided by N) after 
a fraction t of variables have been assigned. The most probable values of the 
densities Cj(t) of j'-clauses are then obtained from the partial derivatives of </?: 
cj{t)^dip/dyj{0,0)iorj^2,3. 

Let us emphasize that (p in equation (49) does not depend on yi. This hypoth- 
esis simply expresses that, as far as non violating partial states are concerned, 
both terms on the right hand side of (47) are of the same order, and that the 
density of unit-clauses, ci = dip/dyi, identically vanishes. 

Similarly, function ip{yi,y2,y3;t) is related to the generating function of dis- 
tribution p{Ci\c2,cs;t), 

^ p(Ci|c2,C3;t) e^'i^i = e^(^i'^^'^3'*)-^(°'^2'^3'*) , (52) 

Ci>0 



where Cj — d(p/dyj{y2,y3;t) {j — 2,3) on the left hand side of the above 
formula. 

Inserting expression (49) into the evolution equation (47), we find 
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^(i/2,i/3;t) = -yi + Y37 



+ 



-y-i 



g-S/2 



1 + 



dip 



^(1/2, 1/3; t) 



(z/2,z/3;t) 



+ ln 1 + (1/1,1/2) e 



-tl'{-oo,y2,y3;t)-ip{yi,y2,y3;t) 



(53) 



where K{yi,y2) = e'^^i^e^y^ + e^^) — 1. As does not depend upon ?/i, the 
latter may be chosen at our convenience e.g. to cancel K and the contribution 
from the last term in equation (53), 



yi = ^1(2/2) =y2- In 



1 + vrT4e^ 



(54) 



Such a proc edure, sometim es called kernel method and, to our knowledge, first 



proposed bv lKnuthI (|1968[ ). is correct in the major part of the y2, ys space and, 
in particular, in the vicinity of (0, 0) we focus on in this paper ^ . We end up 
with the following partial differential equation (PDE) for ip, 



-Q^{y2,y3;t) 



H 



dip dip 

^,1/2, 2/3, i 
oy2 oy3 



(55) 



where H incorporates the details of the splitting heuristic ^ , 



HGUc[c2,C3,y2,y3,t] = -Yi{y2) + 



+ 



3C3 

1-t 

C2 r-Y.{y2) _ 



1 + 6^2 



- 1 



1-t 



(57) 



We must therefore solve the partial differential equation (PDE) (55) with the 
initial condition, 



(p{y2,y3,t = 0) = tto ys 



(5J 



^ It has however to be to modified in a small region of the ?/o , jt/.s sp ace; a complete 
analysis of this case was carried out by Cocco and MonassoiJ (|200lh . 
^ For the UC heuristic, 



Hi 



UC 



In 2 + 



3C3 

l-t 



-y-i 



1 + e 



y2 



+ 



C2 
1 - t 



2 



(56) 
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Fig. 8. Snapshots of the surface uj{p,a;t) for uq = 10 at three different times i.e. 
depths in the tree, t = 0.01, 0.05 and 0.09 (from left to right, top to down). The 
height LO*(t) of the top of the surface, with coordinates p*{t),a*{t), is the logarithm 
(divided by N) of the number of branches. The coordinates {p* (t) , a* (t)) define the 
tree trajectory shown in Figure 3. The halt line is hit at th — 0.094. Note the overall 
growth of the surface Lj{p,a;t) with time (beware of the change of scales between 
figures). 

obtained through inverse Legendre transform (50) of the initial condition over 
13 , or equivalently over u, 



U}{c2,C3]t = 0) = < 



if C3 = an , 

(59) 

-00 if C3 7^ cto ■ 



3. 6 Interpretation in terms of growth process 



We can interpret the dynamical annealing approximation made in the previous 

paragraphs, and the resulting PDE (55) as a description of the growth process 
of the search tree resulting from DPLL operation. Using Legendre transform 
(50), PDE (55) can be written as an evolution equation for the logarithm 
<^(c2,C3,t) of the average number of branches with parameters 02,03 as the 
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depth t = T/N increases, 
duj 



dt 



(c2,C3,t) = H 



C2,C3, 



dco ' dc- 



(60) 



Partial differential equation (P DE) (60) is analogous to growth processes en - 
countered in statistical physics (jMcKane. Droz. Vannimenus and WoU . 1995| ). 



The surface uj, growing with "time" t above the plane 02,03, or equivalently 
from (3), above the plane p,a (Figure 8), describes the whole distribution of 
branches. The average number of branches at depth t in the tree equals 

1 

B{t) = j dp j da e^"(P'"'*) X e^"*(*) , (61) 



where uj*{t) is the maximum over p,a of uj{p,a;t) reached in p*{t),a*{t). In 
other words, the exponentially dominant contribution to B{t) comes from 
branches carrying 2+p-SAT instances with parameters p*{t),a*{t), that is 
clause densities o*2{t) = a*(t){l -p*(t)), o^it) = a*(t)p*(t). Parametric plot of 
p*(t), a*(t) as a function of t defines the tree trajectories on Figure 3. 



The hyperbolic line in Figure 3 indicates the halt points, where contradictions 
prevent dominant branches from further growing. Each time DPLL assigns a 
variable through unit-propagation, an average number u{p, a) of new 1-clauses 
is produced, resulting in a net rate of m — 1 additional 1-clauses. As long as m < 
1, 1-clauses are quickly eliminated and do not accumulate. Conversely, if m > 1, 
1-clauses tend to accumulat e. Opposite 1-clauses x and x are likely to appear , 
leading to a contradiction ( Chao and Francol 1990l : Frieze and Suen . 19961) . 
The halt line is defined through u{p,a) = 1, and reads (|Cocco and Monasson . 
200l[) . 



a 



'3 + 



In 



'1 + V5' 



p 



(62) 



It differs from the halt line a = 1/(1 — p) corresponding to a single branch 
( Frieze and Suen . 1996t ). As far as dominant branches are concerned, an alter- 



native and simpler way of obtaining the halt criterion is through calculation 
of the probability p*sit) = p{Ci = 0\o2(t),o^(t);t) that a split occurs when a 
variable is assigned by DPLL, 

p*^{t)=exp(^{0,0;t)\-l , (63) 
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from equations (52,53). The probability of split vanishes, and unit-clauses 
accumulate till a contradiction is obtained, when the tree stops growing. Along 
the tree trajectory, u!*{t) grows thus from 0, on the right vertical axis, up to 
some final positive value, wthe, on the halt line, ujthe is our theoretical 
prediction for the logarithm of the complexity (divided by N) ^ . 

Equation (60) was solved using the method of characteristics. Using eqn. (3), 
we have plotted the surface u at different times, with the results shown in 
Figure 8 for ao = 10. Values of ujthe, obtained for 4.3 < a < 20 by solving 
equation (60) compare very well with numerical results (Table 1). We stress 
that, though our calculation is not rigorous, it provides a very good quanti- 
tative estimate of the complexity. It is therefore expected that our dynamical 
annealing approximation be quantitavely accurate. It is a reasonable conjec- 
ture that it becomes exact at large ratios ao, where PDE (55) can be exactly 
solved: 

Conjecture 16 Asymptotic equivalent of uj for large ratios 

Resolution of PDE (60) in the large ratio limit gives (for the GUC heuris- 
tic), 



, . 3 + v^ 




(64) 



This result exhibits the 1/ao scaling proven by Weame et al. and 
conjectured to be exact. 



IS 



As ao increases, search trees become smaller and smaller, and correlations 
between branches, weaker and weaker, making dynamical annealing more and 
more accurate. 



4 Upper phase and mixed branch— tree trajectories. 



The interest of the trajectory framework proposed in this paper is best seen 
in the upper sat phase, that is, for ratios ao ranging from a^ to ac- This 
intermediate region juxtaposes branch and tree behaviors, see search tree in 
Figures 2C and 9. 

Notice that we have to divide the theoretical value by In 2 to match the definition 
used for numerical experiments; this is done in Table 1 
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satisfiable 3-SAT instance 



satisfiable 
2+p-SAT instances 



2+p-SAT instance 



unsatisfiable 



G 




of the search 



tree = 0(N) 



total height 



solution 



UNSAT subtree 



Fig. 9. Detailed structure of the search tree in the upper sat phase (ct^ < a < ac)- 
DPLL starts with a satisfiable 3-SAT instance and transforms it into a sequence 
of 2-|-p-SAT instances. The leftmost branch in the tree symbolizes the first descent 
made by DPLL. Above node Go, instances are satisfiable while below Gi, instances 
have no solutions. A grey triangle accounts for the (exponentially) large refutation 
subtree that DPLL has to go through before backtracking above Gi and reaching Gq. 
By definition, the highest node reached back by DPLL is Gq. Further backtracking, 
below Go, will be necessary but a solution will be eventually found (right subtree), 
see Figure 2C. 

The branch trajectory, started from the point (p = 1, ao) corresponding to 
the initial 3-SAT instance, hits the critical line a^p) at some point G with 
coordinates {pc, cxg) after to variables have been assigned by DPLL, see 
Figure 3. The algorithm then enters the unsat phase and, with high probability, 
generates a 2-|-p-SAT instance with no solution. A dense subtree that DPLL 
has to go through entirely, forms beyond G till the halt hne (left subtree in 
Figure 9). The size of this subtree can be analytically predicted from the theory 
exposed in Section 3. All calculations are identical, except initial condition (58) 
which has to be changed into 



As a result we obtain the size 2^'' of the unsatisfiable subtree to be back- 
tracked (leftmost subtree in Figure 9). Nq — N (1 — to) denotes the number 
of undetermined variables at point G. 

G is the highest backtracking node in the tree (Figures 2C and 9) reached 



V{y2, ys, i = 0) = q;g (1 - Pg) Vi + Pa ys 



(65) 
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back by DPLL, since nodes above G are located in the sat phase and carry 
2+p-SAT instances with solutions. DPLL will eventually reach a solution. 
The corresponding branch (rightmost path in Figure 2C) is highly non typical 
and does not contribute to the complexity, since almost all branches in the 
search tree are described by the tree trajectory issued from G (Figure 3). We 
expect that the computational effort DPLL requires to find a solution will, to 
exponential order in A^, be given by the size of the left unsatisfiable subtree of 
Figure 9. In other words, massive backtracking will certainly be present in the 
right subtree (the one leading to the solution), and no significant statistical 
difference is expected between both subtrees. 

We have experimentally checked this scenario for = 3.5. The average co- 
ordinates of the highest backtracking node, {pc — 0.78, ac — 3.02), coincide 
with the computed intersection of t he single branch traiectory (Section 2.2) 
and the estimated critical line adp) (jCocco and Monassonl . 20o3). As for com- 



plexity, experimental measures of uj from 3-SAT instances at oq = 3.5, and of 
ug from 2+0.78-SAT instances at ac = 3.02, obey the expected identity 

i^THE = X (1 - to) , (66) 



and are in very good agreement with theory (Table 1). Therefore, the structure 
of search trees corresponding to instances of 3-SAT in the upper sat regime 
reflects the existence of a critical line for 2-|-p-SAT instances. 



5 Conclusions. 



In this paper, we have exposed a procedure to understand the complexity 
pattern of the backtrack resolution of the random Satisfiability problem (Fig- 
ure 10). Main steps are: 

(1) Identify the space of parameters in which the dynamical evolution takes 
place; this space will be generally larger than the initial parameter space 
since the algorithm modifies the instance structure. While the distribution 
of 3-SAT instances is characterized by the clause per variable ratio a only, 
another parameter p accounting for the emergence of 2-clauses has to be 
considered. 

(2) Divide the parameter space into different regions (phases) depending on 
the output of the resolution e.g. sat/unsat phases for 2-|-p-SAT. 

(3) Represent the action of the algorithm as trajectories in this phase di- 
agram. Intersection of trajectories with the phase boundaries allow to 
distinguish hard from easy regimes (Figure 10). 
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In addition, we have also presented a non rigorous study of the search tree 
growth, which allows us to accurately estimate the complexity of resolution 
in presence of massive backtracking. From a mathematical point of view, 
it is worth noticing that monitoring the growth of the search tree requires 
a PDE, while ODEs are s ufficient to account for the evolution of a single 
branch (|AchlioDtasl . bOOlbl ). 



An interesting question raised by this picture is the robustness of the polyno- 
mial/exponential crossover point T (Figure 3). While the ratio ai separating 
easy (polynomial) from hard (exponential) resolutions depends on the heuris- 
tics used by DPLL {af^^ ~ 3.003, = 8/3), T appears to be located at 
the same coordinates {pT = 2/5, = 5/3) for all three UC, GUC, and SCi 
heuristics. From a technical point of view, the robustness of T comes from the 
structure of the ODEs (2). The coordinates of T, and the time tj- at which 
the branch trajectory issued from (p = 1, = ol) hits the critical line acij)) 
tangentially, obey the equations pi = dpi/dt = with pi = 1 — a(t)(l —p(t)). 
The set of ODEs (2 ), combined with the previous conditions, gives pt = 2/5 
( AchlioDtasl . l2001bl ). 



This robustness explains why the polynomial/exponential crossover location 
of critically constrained 2-|-p-SAT instances, which shoul d a priori depend on 
the algorithm used, was found bv lMonasson et all (|l999l ) to coincide roughly 
with the algorithm-independent, tri critical point on the ac{p) line. 



Our approach has already been exte nded to other decision probl ems, e.g. the 
vertex covering of rando m graphs dHartmann and Weigtl . 12001) or the col- 
oring of random graphs ( Ein-Dor and Monassonl . 2003| ) fsee ( Jia and Moorel 
2003t ) for recent rigorous results on backtracking in this case). It is impor- 



tant to stress that it is not limited to the determination of the average solv- 
ing t i me, but may also be used to capture its distribution ( Gent and Walsh , 
1994 : Cocco and Monasson . 20021 : Mont anari and Zecchina . 2002f) and to un- 
derstand the efficiency of restarts techniques ( Gomes et al.l 2000( ). Finally, we 
emphasize that theorem 6 relates the computational effort to the evolution 
operator representing the elementary steps of the search heuristic for a given 
instance. It is expected that this approach will be useful to obtain results on 
the average-case complexity of DPLL at fixed instance, where the average is 
perfo rmed over the random choices done by the algorithm only (jMonasson . 
2003t) . 
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fraction of 3-clauses p 



Fig. 10. Schematic representation of the resolution trajectories in the sat (branch 
trajectories symboUzed with dashed hne) and unsat (tree trajectories represented 
by hatched regions) phases. DPLL goes along branch trajectories in a linear time, 
but takes an exponential time to go through tree trajectories. The mixed case of 
hard sat instances correspond to the crossing of the boundary separating the two 
phases (bold line), which leads to the exploration of unsat subtrees before a solution 
is finally found. 
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